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ABSTRACT 



This paper identifies World Wide Web site characteristics 
and attributes and groups them in a hierarchy. The primary goal is to 
classify the elements that might be part of a quantitative evaluation and 
comparison process. In order to effectively select quality characteristics, 
different users' needs and behaviors are considered. Following an 
introduction, an overview of the Web-site QEM (Quality Evaluation Method) is 
presented. The following steps that evaluators should follow in applying the 
Web-site QEM are described: (1) selection of the site domain to evaluate or 

compare; (2) specification of goals and user standpoint; (3) definition of 
quality characteristics and attributes; (4) definition of attribute 
evaluation criteria and determination of elementary preferences; (5) 
aggregation of elementary preferences to yield the global quality preference,* 
and (6) analysis and comparison of partial and global quality outcomes. More 
than 60 directly measurable Web site characteristics and attributes are then 
outlined regarding the visitor standpoint and sites ranging from museums and 
academic sites to electronic commerce domains. These are organized into the 
broad categories of usability, functionality, site reliability, and 
efficiency. (Contains 15 references.) (MES) 
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#d&kytftte r acteristics and attributes grouping them in a hierarchy. The primary goal is to 
classify the elements (regarding standards) that might be part of a quantitative evaluation 
and comparison process. In order to effectively select quality characteristics we should 
consider different users’ needs and behaviors. Hence, we outline more than sixty directly 
measurable attributes regarding the visitor standpoint and site domains that could range 
from museums and academic sites to electronic commerce domain. Also, we discuss some 
metrics and we show the big picture of the Web-site Quality Evaluation Method. The 
results should be useful to understand, assess, control, and improve the quality of Web- 
based software artifacts. 
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1. Introduction 

The sudden irruption of the Web around the world has marked a quick growth in the developments of Web- 
based artifacts. However, as elsewhere stressed [Olsina 98a, Rossi 96], much defined models that leverage the 
development and the evaluation activities, mainly in medium and large-scale projects, have not been 
accompanied by that sites growth. Thus, the need of having an engineering approach to help in the 
understanding, evaluation, and improvement of Web-based software products should be considered a 
mandatory requirement. One objective for Web-site evaluation is to find out the extent which a given artifact 
characteristic or set of characteristics fulfills a selected set of requirements regarding a specific user view. 
Therefore, in this way, evaluation implies a logical decision-making process. 

Evaluation methods and techniques fall in two categories: qualitative or quantitative. Even if software 
evaluation has more than three decades as discipline, the systematic and quantitative quality evaluation of 
hypermedia application and particularly of the Web sites is rather a recent and often neglected issue. The 
authors in [Garzotto et al 97] have introduced some evaluation criteria like richness, consistency, among 
others, to evaluate in a qualitative way hypermedia application. However, this approach is only well suited 
when the evaluation problem is rather simple and intuitive. In cases with many elementary attributes, it is 
difficult to evaluate accordingly and it is hard to identify minor differences between similar comparative 
systems. 

Moreover, in the last three years Web-site style guides and design principles have emerged to assist 
developers in the process [IEEE 99, Nielsen 99, Rosenfeld et al 98], and also, list of guidelines that author 
should follow in order to make sites more accessible [W3C 99]. These guidelines and techniques have brought 
insight about essential characteristics and attributes and might improve the Web-site designing process but, 
obviously, do not constitute evaluation methods by themselves. In addition, quantitative surveys [Nielsen 99] 
and domain-specific evaluations for electronic commerce have recently emerged [Lohse et al 98]. Specifically, 
Lohse & Spiller identified and measured over 30 attributes that influence store traffic and sales. However, we 
need a broad and engineering-based method to assess complex quality requirements. 

The aim of this work is to classify, in a standard-compliant way [IEEE 92, ISO 91], characteristics and 
attributes that might be part of a quantitative evaluation process. In order to effectively select quality 
characteristics we should consider different kind of users. We represent many characteristics and sub- 
characteristics, and more than sixty measurable attributes regarding the visitor standpoint and domains that 
could range from presentation and academic sites to electronic commerce domains. In addition, we explain 
some elementary evaluation criteria. The results of applying the proposed method (Web-site Quality 
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Evaluation Method) might contribute to understand, and potentially improve the sites’ quality. 

Therefore, in the following section, we present the main activities that evaluators should perform by 
applying the Web-site QEM. Next, we represent characteristics and attributes regarding the general visitor 
viewpoint and we show some metrics. Finally, we consider concluding remarks and future directions. 



2. Overview Of The Web-Site QEM 

In order to effectively select quality characteristics and attributes we should first consider the site domain, 
evaluation goals, and different stakeholders’ requirements. After considering these steps, the primary objective 
is to group characteristics and attributes that might be part of the evaluation and comparison process. So, to get 
insight of the overall process we outline and describe the main steps that the evaluators should follow by 
applying the Web-site QEM, namely: 

• Selection of the Site Domain to Evaluate or Compare 

• Specification of Goals and User Standpoint 

• Definition of Quality Characteristics and Attributes 

• Definition of Attribute Evaluation Criteria, and determination of Elementary Preferences 

• Aggregation of Elementary Preferences to yield the Global Quality Preference 

• Analysis and Comparison of Partial and Global Quality Outcomes 

Step one. Selection of the Web Information System domain: first, the evaluators should know what would 
be the software domain to evaluate or compare. For instance, regarding WIS or sub-systems we should 
emphasize more usability than security characteristic or both, depending on the specific situation. In electronic 
commerce, security is an essential characteristic, but in an academic site is less important. Besides, if the goal 
is to perform a case study to compare the quality of sites, we should select the typical ones in order to be 
successful throughout the process. 

Step two. Specification of Goals and User Standpoint: in this activity, the decision-makers should define 
the goals and scope of the evaluation process. The results might be useful to understand, control, or improve 
the quality of Web artifacts. The evaluators could evaluate a new running or an operational project, the quality 
of a subsystem, a whole system, or compare global preferences of competitive systems. On the other hand, the 
relative importance of quality characteristics varies depending on the different users. Therefore, we define user 
views (as we will see in the next section). 

Step three. Definition of Web-site Quality Characteristics and Attributes: in this step, the evaluators should 
define, categorize, and specify the quality characteristics and attributes, grouping them into a requirement tree. 
In order to follow well-known standards, the same conceptual characteristics or factors as in [IEEE 92, ISO 91] 
are used; i.e., Usability, Functionality, Reliability, Efficiency, Portability, and Maintainability characteristics. 
From these, sub-characteristics are derived, and, in turn, measurable attributes can be specified. For each 
attribute A\ t a variable X, is associated taking a real value, i.e., the measured value. That hierarchical 
decomposition from characteristics in sub-characteristics and measurable attributes could be considered in the 
software quality metric framework depicted in the IEEE Standard. 

Step four. Definition of the Evaluation Criterion for each Quantifiable Attribute, and perform Elementary 
Measurement: in this task, the evaluators should define the basis for elementary evaluation criteria and perform 
the measurement process. Elementary evaluation criteria say how to evaluate quantifiable attributes. The result 
is a rating, which can be interpreted as the degree of satisfied requirement. For each variable Xi , i = 1, ...,n it 
is necessary to establish an acceptable range of values and define a function, called the elementary criterion. 
This function is a mapping of the variable value (obtained from the empirical domain [Fenton et al 97]) into the 
new numerical domain and called the elementary quality preference, EQj. The elementary quality preference 
EQj can be assumed as the percentage of requirement satisfied by the value of Xj. In this sense, EQj = 0% 
denotes a totally unsatisfactory situation while EQj = 100% represents a fully satisfactory situation. For each 
quantifiable attribute, the measurement activity should be carried out. 

Step five. Aggregation of Elementary Preferences to yield the Global Quality Preference: in this task, the 
evaluators obtain an indicator of global preference for each competitive system or for a single evaluated 
system. For n attributes the corresponding function, produce n elementary quality preferences. Applying a 
stepwise aggregation mechanism, the elementary quality preference can be grouped accordingly, allowing 
computing the global quality preference. The global quality preference represents the global degree of 
satisfaction of all involved requirements. (In two case studies we performed, the Logic Scoring of Preference 
model was used [Dujmovic 96]. The strength of LSP resides in the power to model simultaneity, replaceability, 
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and other attribute relationships using logic aggregation operators). 

Step six. Analysis and Comparison of Partial and Global Quality Outcomes : in this final step, the evaluators 
assess the partial and total quantitative quality preferences regarding the stated goals and user standpoint. Thus, 
specific recommendations can be given to the requester. 



3. Representation Of Characteristics And Attributes 

3.1 Wb-site Quality Oi a r act eristics and Atributes Thee 

In this section, we focus on defining and categorizing a wide set of Web-site quality characteristics and 
attributes. Specifically, by applying the third process step the evaluators group characteristics and attributes in 
a requirement hierarchy. As previously said, we use the same conceptual high-level quality characteristics like 
Usability , Functionality , Reliability , Efficiency , Portability , and Maintainability to follow well-known 
standards. These characteristics give a conceptual and general description of software quality and provide a 
baseline for further decomposition. From these characteristics, we could derive sub-characteristics, and from 
these, we could specify measurable attributes. 

Furthermore, the relative importance of characteristics varies depending on the different users and 
application domains. According to this, three views of quality are defined, namely: visitor view, developer 
view, and manager view [ISO 91]. The visitor category can be decomposed, in turn, in two sub-categories: 
general visitors and expert visitors. The former represents casual or intentional audience maybe having a 
general interest and/or minimum domain knowledge; the later represents, a specialist or expert in the domain. 
In addition, from the visitor viewpoint, quality characteristics such as Maintainability and Portability are not 
relevant. They are mainly interested in the site ease of use and communicativeness, in its browsing and search 
mechanisms, in its coherent navigation mechanisms and dependent-domain expected functionality, and also, in 
the site reliability and efficiency. Thus, in order to assess the Web-site quality, it should be clearly stated the 
desired combination of characteristics and attributes regarding the intended audience. [Figure 1] outline 
characteristics, sub-characteristics, and more than sixty measurable attributes regarding the general visitor 
standpoint and Web domains that could range from museums and academic sites to electronic commerce. Next, 
we discuss some characteristics and attributes. 

The Usability characteristic is decomposed in sub-factors such as Global Site Under standability , Feedback 
and Help , Interface and Aesthetic , and Miscellaneous Features. The Functionality characteristic is split ups in 
Searching and Retrieving , Navigability , and Specific Domain issues, and so on. With regard to Site 
U nder standability , in turn, we have decomposed it in Global Organization Scheme , Labeling , and Guided Tour 
sub-characteristics; i.e., features mainly available in a home page and that could remain during sub-site 
navigation. They contribute to a quick and overall Web-site understanding of both the structure and the 
content. However, for instance, the Global Organization Scheme factor is still too general to be quantifiable; 
many attributes could be grouped in this sub-characteristic. Hence, we decompose it in attributes like Table of 
Content , Site Map , etc. so that, finally, are measurable. 

By considering a specific domain we easily might see that no necessarily all attributes should exist 
simultaneously; it can be necessary a Site Map , or a Table of Content , or an Index. Moreover, for example an 
index type could be replaceable according the domain. Subject-oriented indexes can be better in some 
circumstances than chronological-oriented indexes; besides, more than one index type could stay at any 
moment. (Web-site QEM allows to model simultaneity and replaceability relationships taking into account 
weights and levels of and/or polarization). Likewise, we can model simultaneity relationship in the Web-site 
Search Mechanism. For a given visitor view, it can often be better counting with both scoped and global 
search; i.e., it can be necessary a customized Scoped Search to search a (museum) collection by author and 
school as long as a Global Search can also be necessary to search general issues. Sometimes, specific areas of 
a site are highly coherent and distinct from the rest of the site that makes sense to give a scoped (restricted) 
search to users [Nielsen 99]. However, a basic and advanced global search feature could generally be enough. 

In addition, regarding Reliability factor, the Nondeficiency sub-factor is discussed. That is, the degree to 
which artifacts do not contain undetected errors [IEEE 92]. In this category and considering Link Errors , 
attributes like Broken , Invalid, and Unimplemented Links were selected The Broken Links attribute counts 
dangling links out of the total site links leading to absent destination nodes. Similarly, the Invalid Links 
attribute counts the founded links that drive into wrong or unrelated nodes; and the Unimplemented Links 
attribute counts links that unexpectedly drive to the same origin node. The higher the detected number of links 
errors, the lower the site Reliability. Consequently, the quality is debased. 
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1. Usability 

1.1 Global Site Understandability 

1.1.1 Global Organization Scheme 

1 . 1 . 1 . 1 Site Map 

1 . 1 . 1 .2 Table of Content 

1.1. 1.3 Global Indexes 

1.1. 1.3.1 Subject Index 

1 . 1 . 1 .3.2 Alphabetical Index 

1.1. 1.3.3 Chronological Index 

1 . 1 . 1 .3.4 Geographical Index 

1.1. 1.3. 5 Other Indexes (by audience, by format, 
hybrid, etc.) 

1.1.2 Quality of Labeling System 

1 . 1 .2. 1 Textual Labeling 

1 . 1 .2.2 Iconic Labeling 

1.1.3 Audience-oriented Guided Tour 

1. 1.3.1 Conventional Tour 

1. 1.3.2 Virtual Tour 

1.1.4 Image Map (Metaphorical, Building, Campus, 
Floor and Room Imagemaps, etc.) 

1.2 Feedback and Help Features 

1.2.1 Quality of Help Features 

1 .2. 1 . 1 Web-site Explanatory Help 

1.2. 1.2 Search Help 

1.2.2 Web-site Last Update Indicator 

1. 2.2.1 Global 

1.2. 2. 2 Scoped (per sub- site or page) 

1.2.3 Addresses Directory 

1 .2.3. 1 E-mail Directory 

1.2. 3. 2 Phone-Fax Directory 

1.2. 3. 3 Post mail Directory 

1.2.4 FAQ Feature 

1.2.5 On-line Feedback 

1.2.5. 1 Survey/Questionnaire Feature 

1.2. 5. 2 Guest book 

1.2. 5. 3 Comments 

1.3 Interface and Aesthetic Features 

1.3.1 Cohesiveness by Grouping Main Control Objects 

1.3.2 Presentation Permanence and Stability of Main 
Controls 

1.3.2. 1 Direct Controls Permanence 

1.3. 2. 2 Indirect Controls Permanence 

1.3. 2. 3 Stability 

1.3.3 Style Uniformity 

1.3.4 Aesthetic Preference 

1.4 Miscellaneous Features 

1.4.1 Foreign Language Support 

1.4.2 What s New Feature 

1.4.3 User Profile Detection 

1.4.4 Download Feature 

1.4.5 Screen Resolution Indicator 

2. Functionality 

2.1 Searching and Retrieving Issues 

2.1.1 Web-site Search Mechanisms 

2. 1.1.1 Scoped Search (e.g. Collections, Books, 
Academic Personnel , etc.) 

2. 1.1.2 Global Search 

2.1.2 Retrieve Mechanisms 

2. 1 .2. 1 Level of Retrieving Customization 

2. 1 .2.2 Level of Retrieving Feedback 



2.2 Navigation (and Browsing) Issues 

2.2.1 Navigability 

2. 2. 1.1 Orientation 

2.2. 1 . 1. 1 Indicator of Path 

2. 2. 1.1. 2 Label of Current Position 

2.2. 1 .2 Average of Links per Page 

2.2.2 Navigational Control Objects 

2.2.2. 1 Presentation Permanence and Stability of 
Contextual (sub-site) Controls 

2.2.2. 1 . 1 Contextual Controls Permanence 

2.2.2. 1.2 Contextual Controls Stability 

2. 2.2. 2 Level of Scrolling 

2. 2. 2. 2.1 Vertical Scrolling 

22.22.2 Horizontal Scrolling 

2.2.3 Navigational Prediction 

2.2.3. 1 Link Title (link with explanatory help) 

2232 Quality of Link Phrase 

2.3 Domain Specific and Miscellaneous Functions 

2.3.1 Content Relevancy (depending on the domain we 
should decompose it accordingly) 

2.3.2 Link Relevancy 

2.3.3 Electronic Commerce (valid for some domains. 
Besides, it can widely be decomposed) 

2.3.3. 1 Purchase Features 

2.3.3 . 1 . 1 Shopping Basket Facility 

2.3.3. 1.2 1 -Click Setting 

2.3.3. 1.3 Quality of Product Catalog 

2. 3.3. 2 Secure Transaction 

2. 3.3.3 Account Facility 

2.3.4 Image Features 

2.3.4. 1 Size Indicator 

2. 3.4. 2 Zooming (for museums, campus, etc.) 

3. Site Reliability 

3.1 Nondeficiency 

3.1.1 Link Errors 

3. 1.1.1 Broken Links 

3. 1.1.2 Invalid Links 

3. 1.1.3 Unimplemented Links 

3.1.2 Miscellaneous Errors or Drawbacks 

3. 1.2.1 Deficiencies or absent features due to different 
browsers 

3. 1.2.2 Deficiencies or unexpected results (e.g. non- 
trapped search errors, frame problems, etc.) 
independent of browsers 

3. 1.2.3 Dead-end Web Nodes 

3. 1.2.4 Destination Nodes (unexpectedly) under 
Construction 

4. Efficiency 

4.1 Performance behavior 

4.1.1 Page Size 

4.2 Accessibility 

4.2.1 Information Accessibility 

4.2.1. 1 Support for text-only version 

4.2. 1.2 Readability by deactivating Browser Image 
Feature 

4. 2. 1.2.1 Image Title 

4.2. 1.2.2 Global Readability 

4.2.2 Window Accessibility 

4.2.2. 1 Number of panes regarding frames 
4.222 Non-frame Version 
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3. 2 So me Wb- site Attributes And Its Mtrics 

As said above, for each measurable attribute Aj evaluators can associate a variable Xi, which can take a real 
value. In addition, for each variable it is necessary to establish an acceptable range of values and define a 
function, called the elementary criterion function. The result of this mapping is the elementary quality 
preference, EQi. In turn, preferences can be rated in three acceptability levels: satisfactory, marginal, and 
unsatisfactory. 

[Figure 2], shows a set of 4 elementary quality criteria represented by a preference scale. For instance, the 
evaluation criterion for a Site Map attribute is a simple discrete binary criterion: we only ask if it is available 
(completely satisfactory) or not (completely unsatisfactory). Instead, the evaluation criterion for a Foreign 
Language Support attribute is according to the formula, shown in the right upper side of the figure. The variables 
considered are the number of foreign languages supported by the Web-site (e.g., for museums), and the level of 
support (total, partial, or minimum). The resulting value could be between 0 (completely unsatisfactory) and 
X max (completely satisfactory). If the measured value of X is above X maXi the corresponding elementary 
preference EQ will be equal to X max . (Also, the reader can see the equation in order to obtain the elementary 
preference for the broken link attribute). 



1.1. 1.1 SiteMap 



100 



1 1.4.1 Foreign Language Support 



100 __ X, 



0 = No available (i.e., EQ* = 0 %) 

1 = Available (i.e., EQi = 100 %) 



2. 1.1.1 Scoped Search 
Collections) 



(for 



Museum 



0 % 

100 




0=No search mechanism available 
1=Search mechanism by Author 
and/or Keyword Title 
2= 1 + Expanded Search: search 
mechanism by School and/or Style 
and/or Century (or Date) and/or 
Painting and/or Medium 



Ni=Number of foreign languages supported 
Si= 0,2 -> Minimum support 
S 2 =1 -> Medium support (do not supported 
in all sub-sites) 

S 3 =2 -> Total support 

The formula is: X=FIP = 30 * Xi Si * N f 



£ Where, if X > 100 then EQ = X max 
2 3 . 1 . 1 . 1 Broken Links 



= 100 



0 % 

100 




1 



0 



BL=Number of found links that lead to 
missing destination nodes (also called 
dangling links). 

TL=Number of total site links 
So, X = 100 - (BL * 100/TL) * 10 
Where, if X < 0 then X = X min =0. 

(This measure was automated using the 
SiteSweeper tool). 



50 



0% 



-- X. 



Figure 2 

On the other hand, the evaluation criterion for a Scoped Search attribute is a multi-level discrete absolute 
criterion defined as a subset, where 0 implies no search mechanism available; 1 implies a basic search 
mechanism (accomplishing 70% of the requirement); and 2 implies the basic and advanced (expanded) search 
mechanism (accomplishing 100% of the requirement). 

Once all elementary criteria are agreed and data collected, we can obtain the quality preference for each 
attribute of a system or competitive systems (the fourth step of Web-site QEM). The global quality degree of 
satisfaction of all involved characteristics is obtained by logic aggregation of elementary preferences. In the fifth 
step, we use the Logic Scoring of Preference which compute the global site preference from elementary ones 
applying logic operators based on weighted power means [Dujmovic 96]. 



4. Concluding Remarks And Future Directions 

Web developments are continuous and rapidly growing due to the wide acceptance of Web-based systems for 
very different audiences. However, this rises issues like how to design for quality and cost-effectiveness taking 
into account the satisfaction of different users’ needs and behaviors, or how to assess, interpret outcomes, and, 
ultimately, improve the quality of Web artifacts, among other issues. One effective strategy to face these, is 
product (and process) modeling using prescriptive and/or descriptive approaches [Olsina 98b]. Process and 
product modeling potentially allows us, the understanding and communication; the evaluation and improvement; 
the control and forecasting. 

In this direction, this work proposes a quantitative evaluation method to assess and compare the current Web- 
site quality regarding a user viewpoint. The primary goal was to classify, in a standard-compliant way, quality 
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characteristics and attributes for general visitors. This activity (as part of the third process step), implies a 
hierarchical decomposition from the higher level of the tree -at the characteristic level-, to the lower level of the 
tree, the quantifiable attribute. Hence, the attribute is at the elementary metric level. This requirement 
decomposition framework is easy to understand, powerful, and flexible. It allows deletions, additions, and 
modification of its components. Moreover, we are arranging characteristics and sub-characteristics to be as 
useful for most Web-site domains as possible regarding specific users, (In fact, the highest level like Usability, 
Functionality, Reliability, etc. are thought to be domain-independent characteristics). Also, as previously said, 
the relative importance of characteristics varies depending on users and domains. Therefore, we have defined 
three views of quality: visitors view, developers view, and managers view. Thus, from the point of view of 
general visitors, artifacts characteristics such as Maintainability and Portability will not be taken into account; 
though, from the point of view of developers might not be excluded. On the other hand, managers not only will 
be concerned with quality but also with cost-effectiveness issues. 

Besides, we have discussed quantitative evaluation criteria for some elementary attributes, and we have 
shown the main method activities. One strength of Web-site QEM resides in the modeling of great amount of 
attributes using the LSP approach. We can model simultaneity, replaceability, neutrality, symmetric and 
asymmetric attribute relationships using logical aggregation operators. At the end of the evaluation and 
comparison process, we obtain for each selected Web system a global indicator using the scale from 0 to 100%. 
Such cardinal rating will fall in three acceptability levels, namely: unsatisfactory (from 0 to 40%), marginal 
(from 40 to 60%), and satisfactory (from 60 to 100%). Ultimately, the rational utilization of our method should 
help reduce subjectivity in the process by providing a quantitative basis for quality assessment. Furthermore, it 
provides a powerful tool and concepts to understand and improve the quality of Web sites. 

Finally, we have run a case study on typical, well-known museums [Olsina 99], and other on typical 
academic sites [Olsina et al 99]. Currently, we are running two evaluation projects in the arena of e-commerce. 
On the other hand, the Web-site QEM include a step for the quality metric validation, both theoretically and 
empirically. Ultimately, this research aim is strengthening the evaluation methodology. 
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